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This paper aims to determine the efficiency in classifying and recognizing 
Thai digit handwritten using convolutional neural networks (CNN). 
We created a new dataset called the Thai digit dataset. The performance test 
was divided into two parts: the first part determines the exact number of 
epochs, and the second part examines the occurrence of overfits in the model 


with Keras library's EarlyStoping() function, processed through cloud 
computing with Google Colaboratory, and used a Python programming 
language. The main parameters for the model were a dropout of 0.75, mini- 
batch size of 128, the learning rate of 0.0001, and using an Adam optimizer. 
This study found the model's predictive accuracy was 96.88 and the loss was 
0.1075. The results showed that using CNN in image classification and 
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1. INTRODUCTION 

In the information age, there is a lot of information that exists in the digital world and it was created 
by human hands and then imported into the computer such as handwritten text images, human handwritten 
text. When imported to a computer it can make machines distinguish who's handwriting. Handwriting is a 
movement directed from the brain, possibly unknowingly occurring at the time of writing [1]. Recognition of 
human handwriting it is increasingly important in the digital age because it is used in activities such as 
banking, mail sorting tasks. In the past, it was believed that machines could not process complex tasks. But at 
present, machines can process complex tasks more easily and with high accuracy [2]. In recent times, 
different systems have been developed or classified. It is intended to be used in various fields that require 
high efficiency in classifying or memorizing [3]. Research on human writing or handwriting recognition is 
challenging because each person has different writing styles, even in the same letter [4]. The human brain 
allows humans to interpret any different handwritten letters and numbers through the neural network within 
the brain. This allows us to learn complex new things. There is a wide variety of research that applies the 
neural network to simulate the human brain for reading handwriting in easier ways [5]. Handwriting 
recognition is an issue that is still being studied. Handwriting is easy to remember because there are many 
different things, such as different font styles and the writing styles of each person. Handwriting identity 
identification is very useful. Examples of applying to banking applications, such as handwriting recognition 
to confirm receipt of money or when paying [6]. A systemcapable of recognizing and classifying handwritten 
objects helps prevent complex problems [5]. This has resulted in the development of applications and 
algorithms that can better examine and analyze the semantics of handwritten images [3]. There are many 
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algorithms used for searching, comparing, classifying, and recognizing image data. A popular algorithm for 
image classification is the machine learning algorithm and deep learning [7]. Deep learning is a subset of 
machine learning. The architecture of deep learning has several layers stacked inside and nonlinear 
processing deep learning can make decisions about new information by learning from a given dataset, 
through the neural network. Deep learning is efficient in processing large image data with high accuracy, so 
it is often used for image classification, image and video processing, speech recognition, text prediction, 
handwriting, and more [8]-[13]. One of the popular deep learning techniques for categorizing images is 
convolutional neural networks. 

In this paper, the purpose of developing a model for classifying Thai digit handwritten using 
convolutional neural networks. Thai digits were invented for use during the reign of King Ramkhamhaeng 
the Great. Which has been 738 years. The origin of Thai digits comes from the Devanagari script of India, as 
well as Arabic digtis. It is currently used in Thailand's government offices. Examples of Arabic digits versus 
Thai digits as shown in Figure 1. 


Arabic digit 0 1 2 3 4 5 6 7 8 9 


Thai digit o @ w m € Ë b e G AN 


Figure 1. Arabic digits vs Thai digits 


2. RESEARCH METHOD 
2.1. Convolutional neural network (CNN) 

CNN is one of the most popular deep learning methods used for recognizing and classifying images 
and belongs to the supervised learning category [14], [15]. CNN is a feed forward neural network inspired by 
biology [16], [17]. A CNN consists of neurons or filters with weights and biases that are used to train the 
model to extract image properties. A CNN consists of two parts; feature extraction and classification [18]. 
The basic architecture of CNN is shown in Figure 2. It consists of the input layer, convolution layer, pooling 
layer, and fully connected layer. In the convolution layer and pooling layer, there can be more than one and 
send the data to the fully connected layer [14], [19]. 


are Fully connected Predicted class/ 
Input image Convolutional Pooling Layer layer Output layer 
layer 
\ J \ J 
Y 
Feature Extraction Classification 


Figure 2. The basic architecture of convolutional neural network [13] 


An example of a convolutional is shown in Figure 3. Let's start by multiplying the input with a 
feature detector or filter or a kernel that’s smaller than the input let's multiply element-wise. When you're done 
multiplying, move the kernel all the way to the right [12]. Then add all the results together and get the result in 
the feature map field. Then repeat all input data. You must specify the sliding windows to extract the feature. 

An example of max pooling is shown in Figure 4. The max pooling process reduces the number of 
output parameters that the network must learn [20]. The size of the filter must be determined and then find 
the maximum value in the area where the filter is defined. 

From Figure 3 convolutional operation, there’s a 4x4 input image and a feature detector 3x3. 
The first 3x3 input metric is multiplied element-wise by the feature detector. Then add the results for each 
value and put it in the first box of the feature map. In the figure, it's equal to 2. From Figure 4 is to find the 
Max Pooling, set the pooling size equal to 2x2, and put the largest value in the pooled feature map. 
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Figure 3. Convolutinal operation 


Figure 4. Max pooling 


2.2. Model evaluation 

In the model evaluation phase, the confusion matrix was used to verify the accuracy of the 
prediction and other minor discrepancies. Confusion matrix is used to show the performance of a trained 
model. The values obtained from the confusion matrix are accuracy, precision, recall, and Fl-score. The 
confusion matrix is a table that describes the ability to predict actual vs. machine learning predictions. It 
describes the number of true positives (TP), false positives (FP), true negatives (TN), and false negatives 
(FN), it has the following meanings: TP is what the program predicts is "true" and is "true", TN is what the 
program predicts is "not true" and has a value "not true", FP is what the program predicts is "true" but is 
"false", FN is what the program predicts is "not true" but is "true". The formula for calculating values, 
accuracy, precision, recall, and Fl-score shown in the (1)-(4) [21], [22]. 


(TP+TN) 


Accuracy = (rP+FP+FN4TN) (1) 
Precision = —-— (2) 
(TP+FP) 
EP 
Recall = (TP+FN) (3) 
_ 2*(Recall*Precision) 
F1 Score = (Recall+Precision) (4) 


2.3. Data preparation 

In this operation, it started by collecting handwritten Thai digits from a sample of 200 people, 
comprising students, citizens, and personnel from the public and private sectors, and writing Thai digits on 
the given form as shown in Figure 5. After that, scan the image from the form as a pdf file and crop the 
image into a single digit as a 28x28 pixels, as shown in Figure 6. By randomly selecting from a total of 
14,950 images, it's called the Thai digit dataset. By randomly selecting images from a sample of 14,950 
images called the Thai digit dataset. The Thai digit dataset is divided into 10 classes, which are digits ©-* (0- 
9); each class has 1495 images. After that, the data is divided into two parts: the training dataset of 1,046 
images and the testing dataset of 449 images, which is equal to the ratio of 70:30. 


1000000C 


Figure 5. Thai digit handwritten form Figure 6. Example of thai digit dataset 
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2.4. Model creation 

The CNN model is based on the Python programming language and the Keras library. Cloud 
computing with Google Colaboratory is a cloud service based on Jupyter Notebooks used for machine learning 
education and research. Runtime is configured for deep learning and access to powerful GPU for free [23]. 
The structure of the CNN model in this section, generated with the Keras context, is shown in Table 1. 


Table 1. Summary of CNN model structure 
Layer (type) Output shape Param # 


Input (None, 28, 28, 3) 1568 
conv2d (Conv2D) (None, 25, 25, 32) 1568 
max_pooling2d (MaxPooling2D) (None, 12, 12, 32) 0 
dropout (Dropout) (None, 12, 12, 32) 0 
conv2d_1 (Conv2D) (None, 9, 9, 64) 32832 
max_pooling2d_1 (MaxPooling2D) (None, 4, 4, 64) 0 
flatten (Flatten) (None, 1024) 0 
dense (Dense) (None, 1024) 1049600 
dense_1 (Dense) (None, 10) 10250 


From Table 1, the structure of the model consists of an Input layer, two layers of convolutional 
(Conv2D) and pooling (MaxPooling2D) and a dropout in the middle. Dropout is a regularization technique 
for deep learning [24] and uses the activation function as rectified linear unit (ReLU). After that, 
multidimensional data is transformed into vectors with flatten layer and fully connected in dense layer, which is 
a hidden layer in a neural network. In the last dens layer, we use activation function as Softmax because our 
output is multi-class. It can be written as a schematic showing the structure of the model as shown in Figure 5. 

In Figure 7 model structure Thai digit classification of CNN, input image (RGB color) size 28x28x3 
to convolutional will get output shape or feature map size 25x25x32, and when doing the max pooling 
process, it'll get an output shape is 12x12x32 which is halved from the convolutional process. After this 
layer, a dropout was applied with a probability of 75%. After that, it goes through the convolutional and max 
pooling processes again. It'll get output to shape sizes 9x9x64 and 4x4x64, respectively, and convert 
multidimensional data to one dimension in a flatten layer. Finally, set a dense layer with 10 classes of output. 
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4*1024 1024*10 
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Figure 7. Model structure thai digit classification of CNN 


2.5. Model training 
Model Training is a process that teaches the machine to learn from the data prepared from the data 
preparation phase. The training of the model is divided into two parts to find the efficiency of the model. 

Considering the highest accuracy and the lowest loss, set the values for each part as follows: 

a) Part 1 determines the exact number of epochs to train the model, without checking for overfitting in the 
model. The epoch values for the model training were 100, 200, 300, ... , 1000 and the dropout equal 
0.75; the mini-batch size equal 128; the learning rate equal 0.0001, and using an Adam optimizer, this is 
an algorithm for optimizing model training, resulting in reducing training and validation loss [25], [26]. 
The results for each epoch are shown in Figure 8. 

b) Part2 sets the maximum number of epochs equal to 1000 and monitors the model overfit, allowing the 
training process to stop before the maximum epoch value when the validation data loss value is greater 
than or equal to the previous loss minimum [27]. This technique keeps the model from overfitting by 
using callbacks from Keras EarlyStoping() function. Train the model three times. Each time it assigns a 
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value to the patience variables of 10, 20, and 30, and other variables defined same as Part 1. The model 
training results are shown in Table 2 and Figure 9. 


3. RESULTS AND DISCUSSION 
The results of the model training with a Thai digit handwritten dataset using CNN, the objective is 
to find the best performance of the model. The results of the model training are divided into two parts. 

a) Part 1 the results of the model performance evaluation. The accuracy and loss in the prediction of each 

epoch appear as shown in Figure 8. 
From Figure 8(a) and (b), the model performance evaluation results. The predictive accuracy increased 
until the epoch was 400 with an accuracy equal to 96.93. After that, the accuracy began to drop 
significantly and it has a maximum accuracy of epoch 900, which has an accuracy equal to 
96.97%. Considering the model's Loss, the Loss dropped to an epoch of 400. After that, there is an 
increasing trend. In conclusion, an epoch of 400 is best for training a model with a fixed number of 
epochs because it has high accuracy and low loss. If the epoch increases, the more time it takes to train 
the model. 

b) Part 2 sets the maximum epoch equal to 1,000 and checks the model overfit with Keras library's 
EarlyStoping() function. The highest accuracy and the lowest loss, where patience is 30; accuracy is 
96.88, and loss is 0.1075. It has fewer epochs compared to the patience of 20, which has more accuracy 
and loss. The results are shown in Table 2 and the accuracy and loss can compared as shown in Figure 
9(a) and (b). 

From the results of the experiment in Part 2, the predictive efficiency of the model was the best. When 

the patience parameter was set to 30, the time spent training and testing the model was 45 minutes, with 

the highest accuracy equal to 96.88 and the least loss equal to 0.1075. This is consistent with research 

by [21] using a convolutional neural network (CNN) to classify brain tumor images. Using magnetic 

resonance imaging (MRI), the model's efficacy was 96.1% accurate, and [28] research was conducted 

on gender classification using custom convolutional neural networks architecture. Provides a 

classification accuracy of not less than 96%, and [29]-[33] found that the accuracy was between 90%- 
98%. 

The details of the confusion matrix are shown in Figure 10. Classification errors are caused by the 

similarity of the shapes and the characteristics of writing Thai Digit. The results of the evaluation of 

precision, recall, and Fl-score for each class are shown in Table 3. 


Accuracy Loss 
9750 0.1800 0.1752 
0.1700 
97.00 
0.1600 
0.1500 
96.50 
0.1400 
96.00 0.1300 
0.1200 
95.50 
115 
diiis 0.1151 0.1156 
95.00 0.1000 
100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000 
(a) (b) 


Figure 8. Result of model evaluation (a) accuracy and (b) loss 


Table 2. Results of model training 
Patience Epoch Time (min) Accuracy Loss 


10 156 30 96.46 0.1172 
20 224 49 96.39 0.1235 
30 209 45 96.88 0.1075 


Indonesian J Elec Eng & Comp Sci, Vol. 27, No. 1, July 2022: 110-117 


Indonesian J Elec Eng & Comp Sci 


Accuracy 
97 
96.9 
96.8 
96.7 
96.6 
065 96.46 


96.4 
96.39 


963 
10 20 


Figure 9. Performance of model with earlystoping function (a) accuracy and (b) loss 


True Label 


Figure 10. Best confusion metrix of model training 


Table 3. Best Results of precision, recall, and Fl-score 
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Class Precision Recall Fl-Score 
© (0) 94% 98% 96% 
® (1) 96% 93% 95% 
© 2) 98% 93% 96% 
M (3) 99% 100% 99% 
(4) 96% 95% 95% 
Ë (5) 94% 94% 94% 
D (6) 96% 99% 97% 
T7) 99% 97% 98% 
& (8) 98% 99% 98% 
E (9) 100% 99% 99% 


4. CONCLUSION 


(b) 


0.1075 


30 


115 


We have created a new Thai digit dataset. Starting from creating a form for writing Thai digits, then 
crop into single digits and randomly select all 14,950 images, size 28x28 pixels, divided into 10 classes (0- 
9); each class has 1495 images. After that, the data were divided into 1046 training sets and 449 test sets, 
representing a ratio of 70:30. The process of training and testing the best performing model. We assign 
values to the following parameters: dropout equal 0.75, the mini-batch size equal 128, the learning rate equal 
0.0001, using an Adam optimizer, and checking the model overfit with Keras library's EarlyStoping() 
function, and set patience parameter was set to 30. After setting the values, it shows an accuracy of 96.88 and 
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a loss of 0.1075, which is the best. Therefore, it can be concluded that in creating a model to classify images 
of Thai digit handwritten with convolution neural network, the prediction accuracy is high and the loss is 
low, which is similar to other researchers. For future work, it is advisable to experiment with modifying 
additional parameters to suit the desired task in order to increase the predictive performance of the model to 
be more accurate. 
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